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Abstract 

In this report, an alternative item response theory (IRT) observed score equating method 
was newly developed. The proposed equating method was illustrated with two real data 
sets and the equating results were compared to those of traditional IRT true score and 
IRT observed score equating methods. Using three loss indices, the new method appeared 
to produce equating equivalents more similar to those of the IRT observed score equating 
than those of the IRT true score equating. In addition to the conversion relationships 
between new form scores and their equating equivalents on the old form scale, the 
bootstrap standard errors of equating were provided and compared for the three IRT 
equating methods. These methods performed similarly. 

Introduction 

The number-correct scores from different test forms often need to be equated for the 
purpose of evaluating examinees’ proficiency across different forms or years. Traditionally, 
under item response theory (IRT), there are two equating methods to adjust the raw test 
scores of the new form X onto the old form Y metric (Lord, 1982): IRT true score equating 
(IRT-TSL) and IRT observed score equating (IRT-OSL). The former discovers the 
equivalent score on Y metric, (p(x) , for an observed score x on form X using the test 

characteristic curves for both forms which respectively define the relationship between 
person location parameters (i.e., 6) and the corresponding true test scores. The latter depends 
upon the traditional equipercentile equating method after constructing the expected raw score 
distributions of two test forms which are typically obtained with the use of the recursive 
algorithm (Lord & Wingersky, 1984; Thissen, Pommerich, Billeaud, & Williams, 1995). 

IRT-OSL has explicit advantages over IRT-TSL because IRT-OSL deals with observed 
scores of actual interest in addition to the fact that it could be controversial to treat estimated 
true scores as substitutes for observed scores under IRT-TSL. Also, whereas the IRT-TSL 
method carmot produce equating equivalents for a perfect score or an observed score of x less 
than the sum of the guessing parameters under the 3-parameter logistic model, the IRT-OSL 
method can (Han, Kolen, & Pohlmarm, 1997; Harris & Crouse, 1993; Kolen & Brerman, 
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2004; Lord, 1977). In practical situations, however, IRT-TSE has been widely used as an 
alternative to IRT-OSE beeause this method is easier to eonduet and does not require the use 
of any distribution of ability or expeeted raw scores. 

The main goal of this report is to propose an alternative IRT observed seore equating 
method (AIRT-OSE) that does not require relying on the use of classieal equipereentile 
equating method. The newly proposed AIRT-OSE method employs estimated 6 values 
assoeiated with eaeh number-correet seore for equating the observed seores on the two test 
forms. To this end, Thissen and Orlando’s (2001) ability estimation method known as 
expected a posteriori under summed seoring (EAPss) is used. In this report, the results of the 
AIRT-OSE method are eompared to those of the traditional IRT-OSE and IRT-TSE methods. 
The next seetion begins with an introduetion to EAPss and details the AIRT-OSE proeedure. 

The AIRT-OSE method 

Expected a posteriori under summed scoring (EAPss) 

For item i on an I-item test, let Ui = 1 if an examinee responded correetly and Ui = 0 
otherwise. Eet denote the EAPss values for students being administered form X. 

Aeeording to Thissen and Orlando (2001), the EAPss for a student who earned a raw seore x 
on form X is given by 

\OLJ0)d0 

0 y =4 , (1) 

where {0 ^ ) is the likelihood for eaeh seore x at a given quadrature point 0^ . The likelihood 
(0 ^ ) in Equation 1 ean be eomputed as follows: 

(»,)= E n ^ [1 - . (2) 

UGX i 

where 0(0^) represents a diserete distribution emulating the population density, u denotes the 
item response pattern of an examinee sueh that x-'^u., and Pi{0^) is the probability of 
the eorreet response to item i at a given 0^ . Under the IRT model employed for analyzing 
test data, Pi(0g) is eomputed. For example, in the ease of the 3-parameter logistie model 
(3PEM), it is 
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( 3 ) 



i^.(0J = c,+(l-c,) 



l + exp[-1.7a,(0„ -Z?,)] 



where a., b. and c, denote the diserimination, diffieulty and guessing parameter estimates, 

respeetively. Under the EAPss approaeh, eonsequently, the ability estimate will be the same 
for all students having the same raw seore on form X regardless of their response patterns. 
The EAPss for a student who got the number-correet seore y on form Y (i.e., 0 _v ) be 

computed using the same procedure. 

The AIRT-OSE procedure 

Among various linking designs are the common- item-test design and random- 
equivalent-group design. For the former, the item parameters can be placed on the same 
metric via several approaches using a set of common items (e.g., mean-sigma method, test 
characteristic curve method, concurrent calibrations, etc.). Under a random-equivalent-group 
design, separate calibrations produce item parameter estimates that are considered to be on 
the same scale. Once the item parameter estimates of two forms are placed onto a common 

scale, 6 ^ and 6 v- values calibrated using them are considered to be on the same 

x=yui y~ ^ 

metric. 

When the EAPss estimates of two forms X and Y are available, the procedure for 
implementing AIRT-OSE can be summarized as follows: 

1 . Specify an observed score x on form X. 

2. Find the EAPss ( ) ^bat corresponds to the observed score x. Eet the 
magnitude of this EAPss be represented by0, . 

3. Find the equating equivalent, (p(x ) , on the form Y scale that corresponds to 0, . 

Typically, most (p(x) values resulting from Step 3 will not be whole numbers. This is 

because the nonlinear relationship between the EAPss values and the raw scores is one-to- 
one unique for each form. Thus, to estimate (p(x) in Step 3, a few possible interpolation 

methods are suggested in this report and will be explained following an example detailing the 
3 -step process presented above. 

Assume there are two 40-dichotomous-item test forms X and Y, which share a set of 
common items. Upon successful calibration of each form, their item parameter estimates are 
placed on a common scale through, for example, the Stocking and Lord (1983) method. The 
conversions between the EAPss and the observed raw scores for each form are then 
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computed using these item parameter estimates, whieh are on the same metrie. Figure 1 
exhibits the plots of the eonversions for this illustrative example. 




Figure 1. An illustrative equating based on the 3-step process of AIRT-OSE. 

In Figure 1, for an examinee having an observed seore x = 20 on the new form X, the 
eorresponding EAPss is found to be 0, = -.82 and the eorresponding equivalent on the form 
Y seale, cp(x = 20) , is found to be between y = 17 and y = 18. To deeide a point-estimate 
(p{x = 20) , subsequently, three methods are eonsidered: The first method is a polynomial 
eurve fitting (PCF) approach, the second method is the linear spline interpolation (LSI) 
approaeh, and the third method is a eubie spline interpolation (CSI) approaeh. Following is a 
diseussion of the three approaehes. 

Under the PCF method, the following degree polynomial is employed to fit the score 
points of the old form Y : 

y = (5„r + +- + P,0 + P,+e (4) 

where y and 0 are the observed raw seore for form Y and the corresponding EAPss, 
respeetively. The /3 s represent the fitting eoeffieients and is expeeted to be larger than 

zero. The degree of the polynomial, n, is thought of as an odd integer (e.g., 3, 5, 7, 9, ete.) in 
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this report. In Equation 4, the j3 s can be estimated, for example, using the built-in routine, 

POLYFIT, of the computer software MATLAB (The MathWorks, 2003). For the previous 
example, the estimate of (p(x = 20) is given as follows: 

^{x — 20) = + Pn-id* ^ ftiOt -I- j8q . 

The second approach for estimating (p(x) is the linear spline interpolation (FSI). 
In mathematics, a spline is a function constructed of piecewise polynomial functions. The 
piecewise functions are cormected at the endpoints of contiguous intervals with a certain 
degree of smoothness for the resulting function. An extensive explanation of spline is 
provided by de Boor (2001). For FSI, the piecewise function for each interval is linear, which 
is the simplest spline. Under FSI, ^(x = 20) in the interval of (17, 18) for the previous 

example can be estimated by the following 

0—0 

<p(x = 20) = ^ (18 -17) + 17. 

0v=i8 - Oy=n 

The third approach for estimating (p(x) considered in this report is cubic spline 
interpolation (CSI). For this report, another MATFAB routine, CSAPS which is based on the 
cubic spline method introduced in Schoenberg (1964) and Reinsch (1967), is used for 
implementing CSI. In contrast to FSI which uses linear piece-wise functions, the cubic spline 
applies third-order polynomial, S, within an interval. As de Boor (2001) explained, the 
function of a cubic spline curve can be obtained by minimizing 

p'Z{y,-S(ejy+(l-p)j{S'\t)ydt, (5) 

0,=1 

where i indicates each data point (e.g., /= 41 when a test has 40 dichotomous items). The 
first and second terms in Equation 5 are called the error measure and the roughness measure, 
respectively, while the degree of smoothness is controlled by the smoothing parameter 
p G [0,1] . The smaller the p value, the smoother the spline. For p = 0 , the fitted curve will be 
linear as the ordinary least squares (OFS) line. The choice of /? = 1 produces an unsmooth 
curve that passes through all data points. In this report, four levels of smoothness with p 
0.75, 0.50, and 0.25 are considered. 

For the IRT observed score equating methods (i.e., both IRT-OSE and AIRT-OSE) 
under examination in this study, an operational rule was applied in deciding <p(x) values: 
If <p(x) is less than zero, it is set to be zero. And, when <p(x) is higher than the perfect score 
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of the old form Y, it is assigned the perfeet seore of form Y. This rule was adopted for the 
praetieal reason that IRT-OSE or AIRT-OSE eould produee (p(x) lower than zero or higher 
than the perfeet seore. Under AIRT-OSE, for example, in Figure 1, 0(x = 2) was set to zero 
rather than a negative value. In other words, when an EAPss estimate 9^ is lower than the 
minimum of 9^ (i.e., 9^^^), the eorresponding <p(x) was set to zero, the lowest raw seore. 
And, when an EAPss estimate 9^ is higher than the maximum of 9^ (i.e., 9 the 
eorresponding 0{x) was set to be the perfeet seore of form Y. 



Evaluation and comparison of the results of the three IRT equating methods 

In the following seetion, the applieation of the AIRT-OSE method is illustrated with 
two real test data sets. The results of the method are eompared to those of IRT-OSE and IRT- 
TSE. The eomparisons are presented numerieally and graphieally in terms of the following: 

Three loss indices including mean signed difference (MSD), mean absolute difference 
(MAD), and root mean squared difference (RMSD) m(p{x) values for two different 

equating methods (Han, Kolen, & Pohlmann, 1997). Each index is weighted by the 
frequency of form X scores. The three loss indices are computed as follows: 



^[(P(x)a -<PWb]/'x 

MSD = ^ 

N 



-<P(x)b [fx 

MAD = ^ 

N 



RMSD = 



'Z^<P(x)a 

x=0 



( 6 ) 

( 7 ) 

( 8 ) 



where A and B denote two different IRT equating methods under investigation, is the 
observed frequency, and N is the sample size of the group that was administered form X. 

Patterns of the conversion from form X to form Y scores. Here, <p(x) — x values are 
plotted against the raw score x of form X. 

The standard errors of equating estimated using the bootstrap method (Efron, 1982; Efron 
& Tibshirani, 1993; Kolen & Brennan, 2004). From both forms X and Y, 500 random 
bootstrap samples are drawn, respectively. And, the standard error of equating at a given 
raw score x is estimated by the standard deviation of the 500 (p^(x) values where r = 1, 
..., 500. 
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Two Illustrations 



The three IRT equating methods under investigation were applied to real data for two 
tests. One is a job skills assessment and the other an aeademie aehievement assessment. Note 
that results and analyses presented here are only for illustrative purposes and should not be 
viewed from any other perspeetive. 

Job Skills Assessment 

This report used data from two forms of a mathematies job skills assessment eonsisting 
of 30 multiple-ehoiee items. The two forms used in the example share 1 1 eommon items with 
eaeh other. One form (Y) was administered to about 3,000 examinees, and the other form (X) 
was taken by about 1,800 examinees. The data for eaeh form were separately ealibrated with 
the 3 -parameter logistie model (3-PLM) using binary logistie models (BILOG; Mislevy & 
Boek, 1990). The ealibrations eonverged sueeessfully, and the item parameter estimates for 
form X were plaeed on the form Y seale using the Stoeking and Lord (1983) method. 
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Table 1 

Job Skills Assessment Data: Unrounded Equating Equivalent Estimates, ^(x) s, from the Three IRT Equating Methods (IRT-TSE, IRT-OSE, and AIRT-OSE). 
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similar results in terms of rounded ^(x) s. 










As mentioned earlier, IRT-TSE was not able to estimate (p(y)s for a perfeet seore or an 
observed score of x less than the sum of guessing parameters. To circumvent the former 
problem, 0 = 10 was used in determining the equating equivalent corresponding to x = 30, 
^(x = 30) , on the test characteristic curve of form Y. To solve the latter problem, Kolen’s 
(1981) ad hoc procedure was adopted. For instance, <p(x = 2) was calculated as 
(5.35/5.21) X 2 = 2.12 when the sums of guessing parameters for forms X and Y were 5.21 
and 5.35, respectively. 

For the raw scores of form X ranging from zero through four, ^(x) values under 

AIRT-OSE were found to be zero, which resulted from the operational rule mentioned 
earlier. The 0^ values for x = 0, 1, 2, 3 and 4 were smaller than the minimum 0^. for y = 0. 

This case can happen when form X is easier than form Y for low ability levels. Because an 
examinee who got x = 0, 1, 2, 3, or 4 appeared to be less able than another examinee who got 
y = 0 in terms of EAPss, it is reasonable that (p{x) values for x = 0, 1, 2, 3, and 4 are zero. 



Table 2 

Job Skills Assessment Data: MSD, MAD, and RMSD Calculated with Two Sets of ^(x) s 



Difference Indices 
Equating Methods 


MSD 


MAD 


RMSD 


IRT-TSE 


IRT-OSE 


IRT-TSE 


IRT-OSE 


IRT-TSE 


IRT-OSE 


IRT-TSE 


0 


0.039 


0 


0.166 


0 


0.328 


IRT-OSE 


-0.039 


0 


0.166 


0 


0.328 


0 


AIRT-OSE 


PCF: 3rd Degree Poly. 


-0.091 


-0.052 


0.576 


0.425 


0.739 


0.484 


PCF: 5th Degree Poly. 


-0.029 


0.009 


0.272 


0.145 


0.599 


0.305 


PCF: 7th Degree Poly. 


-0.034 


0.005 


0.284 


0.139 


0.593 


0.298 


PCF : 9th Degree Poly. 


-0.037 


0.001 


0.293 


0.138 


0.587 


0.295 


LSI 


-0.034 


0.005 


0.297 


0.144 


0.587 


0.295 


CSI: 1.00 


-0.038 


0.001 


0.297 


0.142 


0.586 


0.295 


CSI:;? = 0.75 


-0.037 


0.001 


0.437 


0.284 


0.666 


0.373 


CSI:;? = 0.50 


-0.033 


0.006 


0.528 


0.372 


0.717 


0.445 


CSI:;? = 0.25 


-O.OII 


0.027 


0.578 


0.421 


0.749 


0.490 



Note. MSD = mean signed difference, MAD = mean absolute difference, RMSD = root mean squared 
difference, IRT = item response theory, TSE = true score equating, OSE = observed score equating, AIRT = 
alternative item response theory, PCF = polynomial curve fitting, LSI = linear spline interpolation, CSI = cubic 
spline interpolation. 
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Table 2 presents the observed values of MSD, MAD and RMSD eomputed using the 
job skills assessment data. As shown in Table 2, the observed values of MSD, MAD, and 
RMSD for IRT-TSE versus IRT-OSE were 0.039, 0.166, and 0.328, respeetively. Differenees 
in the magnitudes of the three loss indiees were found to be smaller for AIRT-OSE and IRT- 
OSE than for IRT-TSE and IRT-OSE exeept for the ease of AIRT-OSE with third order 
degree polynomial eurve fit. Also, it is noted that the observed differenees in MAD values 
between IRT-OSE and AIRT-OSE were always smaller than those between IRT-TSE and 
AIRT-OSE. The same results were found for RMSD. In summary, the observed values for 
the three loss indiees given in Table 2 indieate AIRT-OSE produeed results eloser to IRT- 
OSE than to IRT-TSE. Among the nine AIRT-OSE approaehes, PCF with the ninth degree, 
LSI, and CSI with p = 1.00 provided the elosest results to those for IRT-OSE in terms of 
RMSD (= 0.295). 




Figure 2. Job skills assessment data: Estimated equating patterns for the three 
IRT equating methods. 

In Figure 2, the relationships between form X seores and their <p{x)s under three IRT 
equating methods (IRT-TSE, IRT-OSE, and AIRT-OSE with CSI using p = 1. 00) were 
plotted. Beeause for AIRT-OSE, the three eases (PCF with the ninth degree, LSI, and CSI 
with p = 1.00) yielded very similar results in terms of RMSD, only the last one was 
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considered in the following analysis. The three eonversions exhibited in Figure 2 appeared 
noticeably different for low scores. 

Between IRT-TSE and IRT-OSE, the largest difference was around x = 5 whieh is 
approximately equal to the sum of the guessing parameter estimates. This was eoineidental 
with the finding of Kolen and Brennan (2004). It was interesting that for x = 0, 1, 2, 3, 4 and 
5, IRT-TSE provided ^(x) values higher than x, whereas the ^(x)s under both IRT-OSE 

and AIRT-OSE were less than the corresponding x values. 



1 1 r 



— 1 — 


- IRT-TSE 


— •— 


- IRT-OSE 


— ■— 


-AIRT-OSE (CSI,p=1.00) 




Form X Raw Score 



Figure 3. Job skills assessment data: Standard errors of equating. 

Figure 3 eompares the standard errors for the three IRT equating methods. For eaeh 
IRT equating method, the bootstrap standard errors of equating were calculated aecording to 
Kolen and Brerman (2004). In the low x seore range, the three equating methods showed 
large standard error differences. For x scores higher than 15, similar standard errors of 
equating around 0.10 were observed. The largest standard errors for AIRT-OSE, IRT-OSE, 
and IRT-TSE were, respectively, found to be 0.69 at x = 5, 0.41 at x = 7, and 0.45 at x = 11. 
For X = 8 or higher, the standard errors of AIRT-OSE eonsistently appeared to be the smallest 
in comparison to those of the two other equating methods. 
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Academic Achievement Assessment 



This study used data from two forms of a mathematics achievement assessment with 
each consisting of 60 multiple-choice items. These two forms were, respectively, 
administered to two randomly equivalent groups of examinees. There were about 2,000 
students in each group. The two forms did not share any common items, and each form’s 
item parameters under the 3-PLM were estimated with BILOG (Mislevy & Bock, 1990). 
With the random equivalent group design, the item parameter estimates of the two forms 
were considered to be on common scale upon successful calibrations. 



Table 3 

Academic Achievement Assessment Data: MSD, MAD, and RMSD Calculated with Two Sets of <p{x) s. 



Difference indices 
Equating methods 


MSD 


MAD 


RMSD 


IRT-TSE 


IRT-OSE 


IRT-TSE 


IRT-OSE 


IRT-TSE 


IRT-OSE 


IRT-TSE 


0 


-0.005 


0 


0.024 


0 


0.079 


IRT-OSE 


0.005 


0 


0.024 


0 


0.079 


0 


AIRT-OSE 


PCF : 3rd degree poly. 


-0.205 


-0.210 


0.659 


0.661 


0.742 


0.733 


PCF: 5th degree poly. 


0.005 


0.000 


0.074 


0.061 


0.105 


0.071 


PCF: 7th degree poly. 


0.006 


0.001 


0.053 


0.036 


0.095 


0.048 


PCF: 9th degree poly. 


0.006 


0.001 


0.044 


0.025 


0.091 


0.038 


LSI 


0.007 


0.002 


0.043 


0.024 


0.092 


0.038 


CSI: ;?= 1.00 


0.006 


0.001 


0.043 


0.025 


0.092 


0.039 


CSI:;? = 0.75 


-0.045 


-0.050 


0.177 


0.165 


0.220 


0.186 


CSI:;? = 0.50 


-0.087 


-0.092 


0.362 


0.357 


0.416 


0.398 


CSI:;? = 0.25 


-0.122 


-0.127 


0.582 


0.577 


0.656 


0.643 



Note. MSD = mean signed difference, MAD = mean absolute difference, RMSD = root mean squared 
difference, IRT = item response theory, TSE = true score equating, OSE = observed score equating, AIRT = 
alternative item response theory, PCF = polynomial curve fitting, LSI = linear spline interpolation, CSI = cubic 
spline interpolation. 

Table 3 presents the observed MSD, MAD, and RMSD computed using data from the 
two academic achievement assessment forms for the three IRT equating methods under 
study. The overall pattern of the three loss indices in Table 3 appears to be very similar to 
those for the job skills assessment in Table 2. Most of loss index values in Table 3 are much 
smaller than the corresponding values in Table 2. However, this might be explained by the 
difference in the number of items between the two assessments. In Table 3, AIRT-OSE 
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versus IRT-OSE, the observed magnitudes of the loss index in absolute values were found to 
be smaller than those for AIRT-OSE versus IRT-TSE in most of the eases. It ean also be seen 
that the three AIRT-OSE eases ineluding PCF with ninth degree polynomials, LSI, and CSI 
with p = 1.00 had very similar results for RMSD (e.g., 0.038 or 0.039). Also, the values for 
RMSD indieated that AIRT-OSE produeed the elosest equating results to those of IRT-OSE. 
To be eonsistent with the analyses of the previous real data set, only the ease of AIRT-OSE 
with CSI ip = 1.00) was ineluded in the following analysis. 




Figure 4. Academic achievement assessment data: Estimated equating 
patterns for the three IRT equating methods. 

In Figure 4, the eonversion relationships between new form seores and their <p(x)s on 

the old form seale are plotted for eaeh IRT equating method. For form X seores ranging from 
1 through 12, three methods showed large differenees in resulting <p(x)s. For the other x 
seores, however, they produeed very similar <p(x) values. For the low x seore range, IRT- 
TSE had <p{x)s higher than eorresponding x seores; whereas both AIRT-OSE and IRT-OSE 
yielded ^(x) values lower than their eorresponding x seores. The largest differenee in 
^(x) between IRT-TSE and IRT-OSE oeeurred at x ~ 11 which is approximately equal to the 

sum of the guessing parameter estimates. When x = 2, AIRT-OSE appeared to have the 
largest value of <p(x)-x among the three equating methods. 
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1 




Form X Raw Score 



Figure 5. Academic achievement assessment data: Standard errors of equating. 

Figure 5 exhibits the bootstrap standard errors of equating calculated for each IRT 
equating method applied to the academic achievement assessment data. The three IRT 
equating methods showed large standard error differences when the form X scores ranged 
between 0 and 11. For the x scores greater than 11, the standard errors of equating for the 
three methods appeared to be very similar and those for AIRT-OSE are the smallest in most 
cases. 

Discussion 

This report presents an alternative IRT observed score equating method that uses ability 
estimates based on summed scores. Exploiting the one-to-one relationship between EAPss 
and the observed number-correct score, this new method could perform observed score 
equating that technically resembles the IRT-TSE procedure. Because AIRT-OSE is based on 
observed scores, however, it was reasonable that the AIRT-OSE output was closer to that of 
IRT-OSE than that of IRT-TSE in terms of the three loss indices. 

In this study, the AIRT-OSE method produced conversions and bootstrap standard 
errors for equating similar to those of the traditional IRT-TSE and IRT-OSE methods. In the 
low X score range, however, the three IRT equating methods tended to show very different 
equating performance. According to the range of the score scale that the test users are mostly 
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concerned about, a eautious applieation of the equating methods needs to be made as warned 
by Kolen and Brerman (2004). For example, if a eut-off seore elassifying examinees into 
master and non-master groups falls in a problematie seore region, different equating methods 
eould entail variant results. 

As explained earlier, the implementation of AIRT-OSE method ineludes a 3-step 
proeedure. In the third step, it is neeessary to eompute (p(x) estimates through a eurve fitting 

or interpolation strategy. To handle this problem, this study attempted three different 
approaehes ineluding PCF, LSI, and CSI. The analysis outeome from the two empirieal data 
sets indieated that PCF with the ninth degree polynomial, LSI, and CSI with p = TOO 
produeed similar equating results to those of IRT-OSE. In general, however, the spline 
approaeh tends to be preferred over the polynomial eurve fitting beeause (a) PCF may be not 
flexible enough to fit various ehanges in real data points with low degree polynomials, and 
(b) PCF is apt to suffer from the problem of multieollinearity with high degree polynomials 
(Marsh & Cormier, 2001). 

To better understand whieh method is the most appropriate in a given testing situation, 
however, further studies need to be eondueted ineluding a simulation study with different 
relevant faetors (e.g., numbers of items, numbers of examinees, kinds of population 
distribution in Equation 2, diehotomous or polytomous IRT models, ete.). Also, it would be 
of interest to investigate the performanee of AIRT-OSE under the Raseh model. Beeause a 
total test seore is the suffieient statistie for an examinee’s latent trait in the ease, the AIRT- 
OSE approaeh ean be applied without resorting to the use of EAPss- 
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